19 research outputs found

    Policy Gradients for Probabilistic Constrained Reinforcement Learning

    Full text link
    This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the system in a safe set with high probability. This notion differs from cumulative constraints often considered in the literature. The challenge of working with probabilistic safety is the lack of expressions for their gradients. Indeed, policy optimization algorithms rely on gradients of the objective function and the constraints. To the best of our knowledge, this work is the first one providing such explicit gradient expressions for probabilistic constraints. It is worth noting that the gradient of this family of constraints can be applied to various policy-based algorithms. We demonstrate empirically that it is possible to handle probabilistic constraints in a continuous navigation problem

    A Multi-Channel Neural Graphical Event Model with Negative Evidence

    Full text link
    Event datasets are sequences of events of various types occurring irregularly over the time-line, and they are increasingly prevalent in numerous domains. Existing work for modeling events using conditional intensities rely on either using some underlying parametric form to capture historical dependencies, or on non-parametric models that focus primarily on tasks such as prediction. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions. We use a novel multi-channel RNN that optimally reinforces the negative evidence of no observable events with the introduction of fake event epochs within each consecutive inter-event interval. We evaluate our method against state-of-the-art baselines on model fitting tasks as gauged by log-likelihood. Through experiments on both synthetic and real-world datasets, we find that our proposed approach outperforms existing baselines on most of the datasets studied.Comment: AAAI 202

    A computational architecture to address combinatorial and stochastic aspects of process management problems

    No full text
    This thesis considers the problem of portfolio selection and task scheduling arising in research and development (R&D) pipeline management, where several projects compete for a limited pool of various resource types. Each project (product) usually involves a precedence-constrained network of testing tasks prior to product commercialization. If the project fails any of these tasks, then all the remaining work on that product is halted and the investment in the previous testing tasks is wasted. Further, there is significant uncertainty in the task duration, task resource requirement, task costs/rewards and task success probabilities. A two-loop computational architecture, Sim-Opt, which combines discrete event simulation and mathematical programming, has been developed by viewing the underlying stochastic optimization problem as the control problem of a performance-oriented, resource-constrained, stochastic discrete event dynamic system. Sim-Opt introduces the concept of a time line, which is a controlled, simulated trajectory that represents a specific combination of the realization of the various sources of uncertainty in the system. Multiple time lines are explored in the inner loop of Sim-Opt to accumulate information, which is subsequently used in the outer loop to obtain improving solutions to the system. Methods have been developed to integrate information from the inner loop with respect to portfolio selection and resource management. Industrially motivated case studies have been investigated using Sim-Opt to evaluate the effectiveness of different policies of operation, to evaluate the value of outsourcing of resources, and to obtain improving solutions in the outer loop. Basic algorithm and software engineering methods to achieve significant improvements in the performance of formulation generation and the generation of a heuristic lower bound along with identification of cut families for effective application of branch-and-cut methods for solution have been described. Lastly, the data complexity of the pipeline problem has been addressed by defining an XML-based structured input language for modeling the data needs in a formatted and extensible manner. This thesis demonstrates the benefit of explicitly viewing the R&D pipeline as the control problem of a discrete-event dynamic system and the effectiveness of Sim-Opt as a practical approach for addressing stochastic optimization

    GaSPing for Utility

    No full text
    High-consequence decisions often require a detailed investigation of a decision maker's preferences, as represented by a utility function. Inferring a decision maker's utility function through assessments typically involves an elicitation phase where the decision maker responds to a series of elicitation queries, followed by an estimation phase where the state-of-the-art for direct elicitation approaches in practice is to either fit responses to a parametric form or perform linear interpolation. We introduce a Bayesian nonparametric method involving Gaussian stochastic processes for estimating a utility function from direct elicitation responses. Advantages include the flexibility to fit a large class of functions, favorable theoretical properties, and a fully probabilistic view of the decision maker's preference properties including risk attitude. Through extensive simulation experiments as well as two real datasets from management science, we demonstrate that the proposed approach results in better function fitting

    Ordinal Historical Dependence in Graphical Event Models with Tree Representations

    No full text
    Graphical event models are representations that capture process independence between different types of events in multivariate temporal point processes. The literature consists of various parametric models and approaches to learn them from multivariate event stream data. Since these models are interpretable, they are often able to provide beneficial insights about event dynamics. In this paper, we show how to compactly model the situation where the order of occurrences of an event’s causes in some recent historical time interval impacts its occurrence rate; this sort of historical dependence is common in several real-world applications. To overcome the practical challenge of parameter explosion due to the number of potential orders that is super-exponential in the number of parents, we introduce a novel graphical event model based on a parametric tree representation for capturing ordinal historical dependence. We present an approach to learn such a model from data, demonstrating that the proposed model fits several real-world datasets better than relevant baselines. We also showcase the potential advantages of such a model to an analyst during the process of knowledge discovery
    corecore